The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
在各个领域(例如政治,健康和娱乐)中的真实和虚假新闻每天都通过在线社交媒体传播,需要对多个领域进行虚假新闻检测。其中,在政治和健康等特定领域中的虚假新闻对现实世界产生了更严重的潜在负面影响(例如,由Covid-19的错误信息引导的流行病)。先前的研究着重于多域假新闻检测,同样采矿和建模域之间的相关性。但是,这些多域方法遇到了SEESAW问题:某些域的性能通常会以损害其他域的性能而改善,这可能导致在特定领域的表现不满意。为了解决这个问题,我们建议一个用于假新闻检测(DITFEND)的域和实例级传输框架,这可以改善特定目标域的性能。为了传递粗粒域级知识,我们从元学习的角度训练了所有域数据的通用模型。为了传输细粒度的实例级知识并将一般模型调整到目标域,我们在目标域上训练语言模型,以评估每个数据实例在源域中的可传递性,并重新赢得每个实例的贡献。两个数据集上的离线实验证明了Ditfend的有效性。在线实验表明,在现实世界中,Ditfend对基本模型带来了更多改进。
translated by 谷歌翻译
多文件科学摘要(MDSS)旨在为与主题相关的科学论文群生成连贯和简洁的摘要。此任务需要精确理解纸张内容以及对交叉纸关系的准确建模。知识图为文档传达了紧凑且可解释的结构化信息,这使其非常适合内容建模和关系建模。在本文中,我们提出了KGSUM,这是一个MDSS模型,以编码和解码过程中的知识图为中心。具体而言,在编码过程中,提出了两个基于图的模块,以将知识图信息纳入纸张编码,而在解码过程中,我们通过以描述性句子的形式首先生成摘要的知识图,提出了一个两阶段解码器。 ,然后生成最终摘要。经验结果表明,所提出的体系结构对多XSCIENCE数据集的基准进行了实质性改进。
translated by 谷歌翻译
人类运动转移是指合成的照片现实和时间连贯的视频,使一个人能够模仿他人的运动。但是,当前的合成视频遭受了序列帧的时间不一致,这些框架显着降低了视频质量,但远未通过像素域中的现有方法来解决。最近,由于图像合成方法的频率不足,一些有关DeepFake检测的作品试图区分频域中的自然图像和合成图像。尽管如此,从自然和合成视频之间的频域间隙方面的各个方面研究合成视频的时间不一致。在本文中,我们建议深入研究频率空间,以进行时间一致的人类运动转移。首先,我们对频域中的自然和合成视频进行了首次综合分析,以揭示单个帧的空间维度和视频的时间维度的频率差距。为了弥补自然视频和合成视频之间的频率差距,我们提出了一个新型的基于频率的人类运动转移框架,名为Fremotr,该框架可以有效地减轻空间伪像以及合成视频的时间不一致。 Fremotr探索了两个基于频率的新型正则化模块:1)频域外观正则化(FAR),以改善个人在单个帧中的外观和2)时间频率正则化(TFR),以确保相邻框架之间的时间一致性。最后,全面的实验表明,FremoTR不仅在时间一致性指标中产生卓越的性能,而且还提高了合成视频的框架级视觉质量。特别是,时间一致性指标比最新模型提高了近30%。
translated by 谷歌翻译
在过去几年中,社交媒体上传播的错误消息激增,并导致了现实世界中的多种威胁。尽管有关于特定领域的虚假新闻(例如政治或医疗保健)的研究,但比较跨领域的虚假新闻几乎没有工作。在本文中,我们调查了2009年至2019年中国最大的Twitter式社交媒体平台的微博上的九个领域的虚假新闻。新收集的数据包含44,728个帖子,由40,215个用户发布,并重新发布了。 340万次。基于多域数据集的分布和传播,我们观察到,在诸如健康和医学之类的日常生活的领域中,虚假的消息比政治等其他领域的帖子更有效,但有效地传播的帖子较少,而政治虚假新闻具有最有效的扩散能力。关于微博上广泛散布的虚假新闻帖子与某些类型的用户(按性别,年龄等。此外,这些帖子都引起了重新播放的强烈情绪,并随着False-News启动器的积极参与而进一步扩散。我们的发现有可能在可疑新闻发现,真实性预测以及显示和解释中帮助设计错误的新闻检测系统。微博上的发现与现有作品的发现表明了细微的模式,这表明需要对来自不同平台,国家或语言的数据进行更多研究,以解决全球错误新闻。代码和新的匿名数据集可在https://github.com/ictmcg/characterizing-weibo-multi-domain-false-news上找到。
translated by 谷歌翻译
基于深度学习的NLP模型被发现容易受到Word替代扰动的影响。在他们被广泛采用之前,需要解决坚固性的基本问题。沿着这条线,我们提出了一个正式的框架来评估词语级鲁棒性。首先,要研究模型的安全区域,我们引入了稳健的半径,这是模型可以抵抗任何扰动的边界。计算最大鲁棒性半径的计算变硬,我们估计其上限和下限。我们将攻击方法作为寻求上限和设计伪动态编程算法的攻击方法,用于更紧密的上限。然后验证方法用于下限。此外,为了评估在安全半径之外的区域的稳健性,我们从另一个视图中重新征服鲁棒性:量化。引入了具有严格统计保障的鲁棒度量,以测量对抗性示例的定量,这表明该模型对安全半径之外的扰动的敏感性。该度量有助于我们弄清楚为什么伯特这样的最先进的模型可以很容易地被几个单词替换所吸引,但在现实世界的噪音存在下概括很好。
translated by 谷歌翻译
假新闻在各个领域的社交媒体上广泛传播,这导致了政治,灾害和金融等许多方面的现实世界威胁。大多数现有方法专注于单域假新闻检测(SFND),当这些方法应用于多域假新闻检测时,导致不满意的性能。作为新兴领域,多域假新闻检测(MFND)越来越受到关注。但是,数据分布,例如词频率和传播模式,从域变化,即域移位。面对严重领域转变的挑战,现有的假新闻检测技术对于多域场景表现不佳。因此,要求为MFND设计专业型号。在本文中,我们首先为MFND设计了一个带有域名标签的假新闻数据集的基准,即Weibo21,由4,488个假新闻和来自9个不同领域的4,640个真实新闻组成。我们进一步提出了一种通过利用域门来聚合由专家混合提取的多个表示来聚合的多域假新闻检测模型(MDFend)。实验表明,MDFEND可以显着提高多域假新闻检测的性能。我们的数据集和代码可在https://github.com/kennqiang/mdfend-weibo21获得。
translated by 谷歌翻译
对比度学习(CL)已成为无监督表示学习的主要技术,该技术将锚固的增强版本相互接近(正样本),并将其他样品(负)(负)的嵌入到分开。正如最近的研究所揭示的那样,CL可以受益于艰苦的负面因素(与锚定的负面因素)。但是,当我们在图对比度学习中采用现有的其他域的硬采矿技术(GCL)时,我们会观察到有限的好处。我们对该现象进行实验和理论分析,发现它可以归因于图神经网络(GNNS)的信息传递。与其他域中的CL不同,大多数硬否负面因素是潜在的假否(与锚共享同一类的负面因素),如果仅根据锚和本身之间的相似性选择它们,这将不必要地推开同一类的样本。为了解决这种缺陷,我们提出了一种称为\ textbf {progcl}的有效方法,以估计否定的概率是真实的,这构成了更合适的衡量否定性与否定性的衡量标准。此外,我们设计了两个方案(即\ textbf {progcl-weight}和\ textbf {progcl-mix}),以提高GCL的性能。广泛的实验表明,POGCL对基本GCL方法具有显着和一致的改进,并在几个无监督的基准上产生多个最新的结果,甚至超过了受监督的基准。此外,Progcl很容易将基于负面的GCL方法插入以改进性能的GCL方法。我们以\ textColor {magenta} {\ url {https://github.com/junxia97/progcl}}}发布代码。
translated by 谷歌翻译
Video recognition in an open and dynamic world is quite challenging, as we need to handle different settings such as close-set, long-tail, few-shot and open-set. By leveraging semantic knowledge from noisy text descriptions crawled from the Internet, we focus on the general video recognition (GVR) problem of solving different recognition tasks within a unified framework. The core contribution of this paper is twofold. First, we build a comprehensive video recognition benchmark of Kinetics-GVR, including four sub-task datasets to cover the mentioned settings. To facilitate the research of GVR, we propose to utilize external textual knowledge from the Internet and provide multi-source text descriptions for all action classes. Second, inspired by the flexibility of language representation, we present a unified visual-linguistic framework (VLG) to solve the problem of GVR by an effective two-stage training paradigm. Our VLG is first pre-trained on video and language datasets to learn a shared feature space, and then devises a flexible bi-modal attention head to collaborate high-level semantic concepts under different settings. Extensive results show that our VLG obtains the state-of-the-art performance under four settings. The superior performance demonstrates the effectiveness and generalization ability of our proposed framework. We hope our work makes a step towards the general video recognition and could serve as a baseline for future research. The code and models will be available at https://github.com/MCG-NJU/VLG.
translated by 谷歌翻译
High-definition (HD) semantic map generation of the environment is an essential component of autonomous driving. Existing methods have achieved good performance in this task by fusing different sensor modalities, such as LiDAR and camera. However, current works are based on raw data or network feature-level fusion and only consider short-range HD map generation, limiting their deployment to realistic autonomous driving applications. In this paper, we focus on the task of building the HD maps in both short ranges, i.e., within 30 m, and also predicting long-range HD maps up to 90 m, which is required by downstream path planning and control tasks to improve the smoothness and safety of autonomous driving. To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels. We benchmark our SuperFusion on the nuScenes dataset and a self-recorded dataset and show that it outperforms the state-of-the-art baseline methods with large margins. Furthermore, we propose a new metric to evaluate the long-range HD map prediction and apply the generated HD map to a downstream path planning task. The results show that by using the long-range HD maps predicted by our method, we can make better path planning for autonomous vehicles. The code will be available at https://github.com/haomo-ai/SuperFusion.
translated by 谷歌翻译